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Abstract. Communication channel established from a display to a device's camera 
is known as visual channel, and it is helpful in securing key exchange protocol [' -]. In 
I this paper, we study how visual channel can be exploited by a network terminal and 

mobile device to jointly verify information in an interactive session, and how such 
04 information can be jointly presented in a user-friendly manner, taking into account 

1— ( that the mobile device can only capture and display a small region, and the user may 

, ^ only want to authenticate selective regions-of-interests. Motivated by applications in 

Kiosk computing and multi-factor authentication, we consider three security mod- 
00 els: (1) the mobile device is trusted, (2) at most one of the terminal or the mobile 

device is dishonest, and (3) both the terminal and device are dishonest but they do 
not collude or communicate. We give two protocols and investigate them under the 
abovementioned models. We point out a form of replay attack that renders some other 
straightforward implementations cumbersome to use. To enhance user-friendliness, we 
CZ3 propose a solution using visual cues embedded into the 2D barcodes and incorporate 

the framework of "augmented reality" for easy verifications through visual inspec- 
tion. We give a proof-of-concept implementation to show that our scheme is feasible 
in practice. 

> 

fS| Keywords: Visual channel protocol, Sub-region authentication. User-friendly verification. 

o 

^ 1 Introduction 

o 

Securing connection to a server through an untrustcd network terminal is challenging even if 
the user has additional factor for authentication like one-time-password token, smartcard, or 
^ a mobile phone. One of the hurdles is the difficulty in securely passing information from the 

• '~j terminal to the device, and presenting the jointly verified authentic information to the user 

r> in a user friendly manner. Using traditional channel to connect the device and the terminal, 

like wireless connection or plug-and-play connection, are subjected to various man-in-the- 
middle attacks. Even if a secure channel can be established, it is still not clear how the 
additional device can help in authenticating subsequent messages rendered on the untrusted 
terminal's display. 

A number of recent works utilize cameras in the mobile devices to provide an alter- 
native realtime communication channel from a display unit to a mobile device: messages 
are rendered on the display unit in a form of, say 2D barcodes, which are then captured 
and decoded by the mobile device via its camera. Although such visual channel could be 
eavesdropped by "over-the-shoulder" attacks, it is arguably impossible to modify or insert 
messages, and thus secure against man- in-the- middle attack. Visual channel has been ex- 
ploited in a few works in verifying the session key exchanged over an unsecured channel, 
for instance seeing-is-believing proposed by McCune et al. [18]. There are also proposals on 
verifying untrusted display, for example, Clarke et al. propose verifying the display screen 
using stabilized camera device [ ]. In this paper, we take a step further by investigating 
authentication of interactive sessions, with consideration that many cameras are unable to 



cover the whole screen in a single view with sufficient precision. An example of interactive 
session is online banking application where a user can browse and selectively view pervious 
transactions, and carry out new transactions. A typical screenshot would contain important 
information like the user's account information, and less sensitive information like adver- 
tisements, help information, and navigation information, as shown in Figure 1(a). 

During an interaction session, after a session key kg has been securely established between 
the server and the mobile device (could be established using seeing-is-believing [■■^]), there 
could be many subsequent communication messages that require protection by fc^. These 
messages may need to be rendered over different pages, or in a scrolling webpage where 
not all of them are visible at the same time. We remark that it is not clear how to protect 
them. For instance, one may render the messages as 2D barcodes, each protected by the 
same kg. To view the message in a 2D barcode, the user moves the mobile device over the 
barcode, and the device will capture, authenticate and display the message on its display 
panel. However, as there are many barcodes associated with the same key, it is possible for a 
dishonest terminal to perform "rearrangement" attack: replays barcodes or shows barcodes 
in the wrong order. 

The above attack arises due to the limitation that the camera is unable to capture the 
whole screen with sufficient precision, and not all messages can be rendered together in 
a single screenshot. We treat the problem as the authentication of messages rendered in 
a sequence of large 2D regions, where only region in a small rectangular window can be 
captured at one time. There are a few straightforward methods to overcome the rearrange- 
ment attack. For instance, one may prevent the attack by requiring the user to scan all the 
barcodes with his mobile device, and all the messages will be authenticated and rendered 
by the mobile device. However, it is troublesome for the user to scan all the barcodes, and 
there are situations where the user only wants to view some, but not all, of the messages. 
In addition, it is less preferred to navigate and browse the messages (e.g. a large table of 
transactions) within the relatively small display panel. In Section 6, we will discuss a few 
other straightforward methods and their limitations. 

Our solution is to use a barcode scheme that given a message m and a visual cue v, is 
able to produce a barcode image that not only carries m as its payload, but also visually 
appears as v (see examples in Figure 1(b) and Figure 1(c)). Our paper realizes such barcode 
scheme using technique borrowed from fragile image watermarking [12]. To embed a long 
messages into several barcodes, our main idea is to have a visual cue on each barcode 
indicating its position. By visually inspecting the visual cues, the user can readily verify 
that the barcodes are in the correct arrangement. For example, in Figure 1(b), the visual 
cues are numeric numbers increasing by 1 from left to right, top to bottom. The black dot 
beside the number "2" indicates that the barcode is at the end of row, and the black block 
beside the number "10" indicates that it is the last (i.e. bottom-right) barcode. With the 
arrangement of barcodes verified, the user can then browse selective barcodes independently 
with his mobile device. 

In our security analysis, we consider the four parties setting where a user, who has a 
mobile device, wants to interact with a server via a network terminal. We focus on three 
security models. In the first model, the Internet terminal, including its CPU, keyboard and 
display unit, is untrusted by the user, whereas the mobile device is trusted. This model is 
motivated by the challenging problem in securing Kiosks [13, IG], where Kiosks are untrusted 
public network terminal like workstations in Internet cafe. 

In the second model, motivated by two-factor authentication, we consider scenarios where 
at least one of the terminal or mobile device is honest. We found that under the first model, 
it is possible to provide both confidentiality and authenticity; whereas under the second 
model, although authenticity can be achieved, it is not clear how to achieve confidentiality. 

In the third model, we take one more step beyond two-factor authentication and consider 
a tricky setting where both the terminal and mobile device could be dishonest, but they do 
not collude in the sense that they do not know how to communicate with each other. This 
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(a) A bank transaction webpage. (b) Method 1: mobile device is (c) Method 2: mobile device 

trusted. could be dishonest. 



Fig. 1. Illustration of our schemes: Figure 1(a) is the bank transaction screenshot which contains 
a sensitive transaction table to be protected. Figure 1(b) illustrates method 1 where the sensitive 
table is replaced by barcodes; and the mobile device captures, verifies and decodes part of the 
table. Figure 1(c) illustrates method 2 where the sensitive table is displayed with barcodes; and the 
table is rendered on both terminal and mobile to be compared by the user. The decoded tables are 
generated by our proof-of-concept implementation which are then "cut-and-paste" to produce the 
illustration. The green boxes show the captured region and the red dots are for image registration. 

model is motivated by scenarios where the terminal and mobile device are compromised, 
but independently by two different adversaries, for instance, a dishonest mobile device that 
always says "authentic" for whatever authentication it is supposed to carry out, and a 
network terminal that is tasked to deceive the user to accept a message given to the terminal. 
To detect such dishonest mobile device, our proposed method requires the mobile device to 
extract and produce a human readable proof from the authentication tag. A corresponding 
proof is also shown in the terminal's display and hence the user can visually verify whether 
they are consistent, as shown in Figure 1(c). 

In addition to security requirements, user experience is also important. Requiring the 
user to take snapshot of the screen is rather disruptive from the user's point of view. We 
employ augmented reality to provide better user experience in verification. The design of 
our 2D barcode and the subregion authentication takes useability into consideration and fits 
nicely in the framework of augmented reality. One example is as shown in Figure 1(b). The 
screenshot displayed by the terminal is a combination of sensitive data and non-sensitive data 
like advertisement and menu. The sensitive data are replaced by 2D barcodes with visual 
cue as described before. The user treats the mobile device as an inspection device and places 
the mobile phone over the region to be inspected. In realtime, the mobile device captures 
and verifies the 2D barcode. If it is authentic, the decrypted message is displayed. The non- 
sensitive portion of the screenshot is also displayed as it is to help the user to navigate. We 
give a proof-of-concept system where we use a laptop equipped with a webcam to simulate 
the mobile device to show the feasibility of our methods. 

Organization We formally define our problem and three adversary models in Section 2. 
Assuming the existence of a barcode scheme that is secure against rearrangement attack, 
we propose two protocols and analyze them under the three adversary models in Section 3. 
We give a construction for the required barcode scheme using visual cues in Section 4 
and discuss the design of visual cue symbols in Section 5. We compare our solutions with 
possible alternative methods in Section 6. We describe our proof-of-concept implementation 
in Section 7 and measure its performance in Section 8. A discussion of existing work is given 
in Section 9. Section 10 gives a conclusion of our paper. 



2 Models and Formulation 



There are four parties involved in our problem: the user, the server, the mobile device and 
the network terminal. Let us call them User, Server, Mobile, and Terminal respectively. 
In our framework, the term "user" literately refers to a person, and the mobile device is 
equipped with a camera, input device, a small display unit and a chip that can perform 
decoding of barcodes. 

The communication channels among the four parties are as shown in Figure 2. Note 
that there is no direct communication link between Mobile and Server. With 3G mobile 
network and WiFi connection widely available, one may argue that the model should consider 
such a link. Nevertheless, there are situations where the connection is not available due to 
cost or other constrains. In addition, there are also security concerns if the mobile device 
has Internet connection during the transactions: if Mobile can directly send messages to 
Terminal, they may collude and conduct coordinated attack. Table 1 gives a summary of 
our notations. 

We consider the following security models for the channel between Server and User: 

1. Model 1: Terminal is not trusted by User, but Mobile is trusted and we want to protect 
both confidentiality and authenticity. 

2. Model 2: At least one of Terminal and Mobile is honest and we want to protect au- 
thenticity. 

3. Model 3: Both Terminal and Mobile could be dishonest but they do not collude and 
we want to protect authenticity. 

In Model 3, we treat the dishonest Terminal and Mobile as two different adversaries 
Aj and with two different goals. Aj is the dishonest terminal and its intension is to 
trick the user to believe that a given message m' is authentic. The actual value of m' is 
not determined prior to the connection. We can view it as a randomly chosen message that 
is passed to the At. The adversary A^ is the dishonest mobile and has a easier goal: it is 
free to construct any message and trick the user to wrongly believe that it is authentic. An 
example of An is one who always accepts whatever verification it is tasked to do. To capture 
the notion that they do not collude, we impose the restriction that A-y and A^ do not know 
how to communicate with each other, and the forge message m' is randomly chosen and 
hold by one party. Hence, we exclude the attack where Aj covertly sends the message m' to 
An through the visual channel. 
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Fig. 2. The communication channels among the four parties. 



3 Protocols 

We now give our proposed protocols for securing the communication between Server and 
User assuming we have a barcode embedding technique that can protect the integrity and 



Table 1. Summary of Notations. 





The message from User to Server. 


ms 


The message from Server to User. 




The key used in message authentication scheme. 


ks 


The key used in encryption. 




The key used in embedding visual cue. 


hjs 


The session key containing tuple {kj, k^, fcv). 


V 


A visual cue symbol that carries the location information. 




A barcode image encoding a message m and visual cue i; with key ks- 




An authentication tag of a message m with key kj. 


£kz (m) 


An encryption of a message m with key k^. 


ECC(m) 


An error correcting encoding of a message m. 


A — > B : m 


The entity A sends a message m to another entity B. 


A B : m 


The entity A sends a message m to B using C as a relay point. 



confidentiality of its payload, and visible visual cue can be rendered onto the barcode to 
indicate the barcode location as in Figure 1(b). Given a message m, a visual cue v, and a 
session key kg, let us write the barcode (represented as images) as B{ks,m,v). For clarity 
in presentation, we first consider the case where the message can be embedded into one 
barcode block whose size is small enough to be entirely captured by Mobile's camera with 
sufficient precision. Thus, we take the visual cue as a single dot, indicating to the user that 
there is only a single barcode to be read. We will later study the case for multiple messages 
in Section 4 and Section 5. 

We assume that Server has already established a long term shared key with Mobile 
when the user registers an account with the server. In additional, for model 2 and 3, we 
assume that User has established a password with Server. Before each interactive session. 
Server authenticates User and Mobile to get a session key kg. A secure key exchange can 
be derived from modified seeing-and-believing [ ' ] and combination of the proposed method 
in this section. Due to space constrain, we do not include details in this paper. 

3.1 Server to User 

Consider the case where Server wants to send a message ms to User. We propose two 
methods, denoted MSI and MS2 (message from server), where method MSI is more user- 
friendly compared to MS2, but it requires that Mobile is trusted. 

MSI. To send a message ms to User, the following steps are carried out. (1) Server 
generates a barcode image B(fcs,ms,w) and sends the barcode to Terminal. (2) Terminal 
displays the barcode. (3) User inspects and verifies the visual cue is valid. (4) Mobile cap- 
tures the barcode and rejects if the barcode is not authentic. (5) Mobile renders ms on its 
display. (5) User reads and accepts ms from Mobile's display panel. Below is a summary 
for MSI: 



1. Server —> Terminal: B(fcs, 'TIS, 'y); 

2. Terminal — > User: v; 

3. User verifies u; 

4. Terminal Mobile: B(ks^ms,v)] 

5. Mobile — >■ User: ms; 

6. User accepts TOs. 



MS 2. The main difference in this method from the previous MSI is that, the message mg 
is displayed by both Terminal and Mobile for User to verify, and thus User is able to detect 
if one of them is dishonest. (1) Server first generates a barcode image B{ks,ms,v), then 
it sends both the barcode image and the message nis to Terminal. (2) Terminal displays 
the barcode, side- by-side with ms. (3) User inspects and verifies the visual cue is valid. (4) 
Mobile captures the barcode and rejects if the barcode is not authentic, otherwise, displays 
ms. (5) User reads ms from Mobile's display panel and Terminal's display. (6) User accepts 
ms if the ms in step (2) is consistent with ms in step (4). Below is a summary for MS2: 



1. Server Terminal: B{ks,ms,v), ms; 

2. Terminal — >■ User: v, msi; 

3. User verifies v; 

4. Terminal —> Mobile: B{ka,ms,v); 

5. Mobile — >■ User: ms2; 

6. User accepts msi if msi = ms2. 



3.2 User to Server 

Now we consider the following methods MUl and MU2 (message from user) for sending 
the message mu to Server. Method MUl protects both confidentiality and authenticity of 
the message, whereas method MU2 protects only the authenticity but involves less user 
operation. 

MUl. MUl consist of the following steps to send a message mu to Server. (1) User enters 
mu to Mobile. (2) Mobile computes and shows User the encrypted fovm £k^{m^)\\Tkj{SkE{''T^v)) 
in readable characters (for e.g. using uuencode). (3) User sends displayed string to Server 
through Terminal's input device. (4) Server accepts mu if the tag is valid. Below is a sum- 
mary for MUl: 



1. User — > Mobile: m^; 

2. Mobile -> User: ffe^(mu)||7fcT(fftf.('^u)); 

3. User^°-^ Server: £:fe^(mu)||7fcT(ffeE("^u)); 

4. Server accepts mu if the tag TkjiSkei'mv)) is valid. 



MU2. In scenarios where the confidentiality of mu is not required, we can employ a more 
user friendly protocol MU2 as follow: (1) User enters mu through Terminal's input device, 
and Terminal forwards mu to Server. (2) Server generates a barcode B{ks, mu||c, v), where 
c is a randomly generated nonce and || means concatenation. Server sends the barcode to 
Terminal. (3) Terminal displays the barcode, and User visually verifies that the visual cue 
V is correct. (4) Mobile captures the barcode and rejects if the barcode is not authentic. (5) 
Mobile renders the message mu and the nonce c on its display. (6) If mu is consistent with 
the message User entered in step (1), User enters c to Terminal, and Terminal forwards it 
to Server. (7) Server rejects if the nonce c is wrong. 

Although involves more steps, MU2 is less tedious from the User's point of view, since 
User does not need to enter mu using Mobile's input device. The corresponding steps for 
MU2 are summarized below: 



ITT TaTTninal _ 

. User Server: mu; 



2. Server Terminal : B{ks,mu\\c,v); 

3. Terminal — >■ User: v; 

4. Terminal —> Mobile: B{ks,Tnx!\\c,v); 

5. Mobile — >■ User: mu,c; 

6. User Server: c; 

7. Server accepts mu if c is consistent, rejects otherwise. 



3.3 Analysis 

In this section, we analyze our methods under different adversary models. 

Model 1 (Mobile is trusted) In Model 1, wc use MUX for sending message to Server, 
and use MSI for Server to send message to User to achieve confidentiality and authenticity 
of the communication channel. 

For both methods. Terminal plays the role of a relay point for passing message and thus 
a malicious Terminal is the man-in-the-middle. Hence, this is the classical setting where 
the two end points (Server and Mobile) having a shared key want to communicate over a 
public channel. The cryptographic technique (encryption and message authentication code) 
can secure the channel and provide both confidentiality and authenticity. 

It is clear that MU2 and MS2 cannot protect the confidentiality under this model as the 
messages are sent in clear through Terminal, and thus they are not suitable in this model. 

Model 2 (At least one is honest) In Model 2, we use MU2 to send message to Server, 
and use MS2 for Server to send message to User. We want to achieve authenticity of the 

message ms. Wc arc not interested in confidentiality here. It is an interesting future work 
to investigate whether confidentiality can be achieved under this model. Since we are not 
sure which of Mobile and Terminal is dishonest, it is not clear whether confidentiality can 
be achieved. 

Suppose Terminal is dishonest. In both directions of the communication, we can treat 
the barcode as the MAC of the message, mu and ms respectively, and Terminal does not 
have the key. Similar to analysis for Model 1, this is a classical setting and the authenticity 
of the message inherit from the MAC we used in the barcode construction. 

On the other hand, let us consider the case where the Mobile is dishonest. In MU2, 
Terminal is honest and will forward mu to Server as it is, thus, it is impossible for Mobile 
to modify mu without Server notices. Similarly, in MS 2, since the actual message ms is 
displayed by the honest Terminal, User can compare the displayed message and thus any 
modification can be detected. 

Note that MUX and MSI is not secure in this model: if Mobile is dishonest and change 
the message to m', there is no way for User or Server to verify it. 

Model 3 (No collusion) It turns out that the protocol we used in method 2, i.e. MU2 
and MS2, can achieve authenticity in this model as well. 

Let us first analyze MU2. Recall that the goal of a dishonest Terminal is to trick Server 
to accept a message my. To do so Terminal must send Server the message m'u, and obtain a 
barcode b contains mj, and c. Server accepts only if the verification code c is presented. 
Since c is randomly chosen. Terminal is unlikely to succeed in guessing c. Therefore, he 
needs to get c from user. Without any hint from Terminal, Mobile is not able to display 
the message that the user is expecting. 

Now let us analyze MS 2. In this case the dishonest Terminal wants to trick User into 
accepting a message mg. To achieve the goal, it must display mg side- by-side with the 



barcode. As Terminal docs not know the key kg he is unable to forge the barcode. Now, 
consider the dishonest Mobile. Recall that there is no communication from the Terminal 
to Mobile, the Mobile is unable to display the message mg which is required to trick User 
to accept mg. 



Table 2. Summary of Methods. 





MU 1 


MU 2 


MS 1 


MS 2 


Model 1 
Model 2 
Model 3 


C, A, Ul 
N 
N 


A, Ul, U2 
A, Ul, U2 
A, Ul, U2 


C, A, Ul, U2 
N 
N 


A, Ul, U2 
A, U2 
A, U2 



Note: C, A, N are related to security goals and Ul, U2 are related to useability. 
C: confidentiality is achieved; A: authenticity is achieved; N: none of C and A can be achieved. 
Ul: no user comparison of messages is required; U2: no user input via Mobile's input device is required. 

Table 2 summarizes the security and user friendliness of our methods under different 
models. 

4 Visual Channel 

A main component in building our visual channel is the construction of 2D barcode with 
visual cues: given a secret key kg ~ {kj, k^, k\i), a message m, and a visual cue symbol v 
we want to produce a 2D barcode B(fc<;, m, v) such that the cue v is clearly visible, and the 
message m can be extracted under noise. On the other hand, there are security requirements 
on the confidentiality of m and integrity of m and v. Any modification on m and v must be 
detected. 

4.1 Construction Overview 

There are a number of stages of the visual channel construction: 

1. (Encryption-then-MAC): Given to, and the keys k^, kj, the message to is protected using 
encryption and MAC with key kz and kj respectively, and get toq = i5fej,(TOu)||7fcT(^^fcE("iu))- 

2. (Error correcting): Error correcting code is then applied on the result toq, and get 
ECC(too), let us call this mi. 

3. (Embedding visual cue): Given a message toi, a key fcy, and a visual cue v represented 
as a 2D array of bits, the toi is embedded into a larger 2D array of bits / which visually 
appear as v, Section 4.2 gives details on the embedding process. 

4. (Adding control point and rendering): A set of control points(red dots in Figure 1(b)) 
is then added around / for image registration purpose. 

Thus, our barcode is a black and white image with red pixels. 

4.2 Encoding with Visual Cue 

When a message is too large, multiple barcodes are required to encode it. As mentioned 
in the introduction, multiple barcodes protected by a single session key are subjected to 
"rearrangement" attack. To detect the attack, we propose binding location information to 
the barcode using visual cue. This section gives a method in embedding the visual cue. Note 
that the process of embedding a visual cue to a barcode is essentially digital watermarking, 
where the visual cue is the host, and the barcode is a message to be "watermarked" to the 
host. 



Given a n-bits message mi , let us arrange it as a a; by ?/ binary matrix where n — x-y and 
X is even. Let us assume that the given visual cue is a x/2 by y pixels image where each pixel 
is either (representing a black pixel) or 1 (representing a white pixel). Therefore, every 2 
bits in m is associated with 1 pixel of the visual cue, and together they can be represented 
with 3 black-and-white pixels in the final barcode. The 3 pixels are arranged in a "L" -shape 
as shown in Figure 3(a). Let us call the 3 pixels as a L-block. The 2^ combination of values 
in a L-block is divided into two groups: W and B. The L-blocks in W have more white 
pixels and thus the L-blocks appear as "white" . Conversely, the L-blocks in B will appear 
as "black". 
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(a) Two groups of L-blocks. 
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(b) Tile up with L-blocks. 
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Fig. 3. L-blocks for constructing visual cues 



Given a binary value vi € {0, 1} of a pixel of the visual cue image, we want to encode 
two bits (&i, &2) into a three pixels L-block, such that the brightness of the L-block can be 
adjusted according to vi. For instance, if wi = 1, the encoding outputs only elements in W. 
Since there are 4 elements in W, it is possible to encode the two bits bi and &2- Beside for the 
value of vi, there is no further constraint on how the encoding of (6i, 62) to the 4 elements 
in W is to be done. In order to prevent the adversary from modifying the appearance of 
the visual cue, the mapping from the 2 bits (&i,&2) to the three pixels of the associated 
L-block, {pi,p2,P3) , has to be kept secret. Hence, the key space for encoding a bit pair is 
4! X 4! = 576. 

To decode a barcode. Mobile applies the decoding and decryption functions in a reverse 
order and ignore the bit vi. That is, it first extracts the bit pairs from every L-blocks, and 
get the message m' . Next, error correcting is applied and the authenticity of the message 
can be verified. 



4.3 Security Analysis 

We would like our barcode scheme to achieve the following properties: (l)authenticity and 
confidentiality of mg and (2) the integrity of visual cue. 



Authenticity and confidentiality of message The authenticity and confidentiality 
of the message embedded in our barcode scheme rely on the security of the underlying 
encryption and message authentication scheme. Bellare et al. [A] show that when the en- 
cryption achieve indistinguishability under chosen-plaintext attack (IND-CPA) , and the 
message authentication scheme Tkj is strongly unforgeable (SUF-CMA), then the Encrypt- 
then-MAC composition method achieves IND-CPA, INT-CTXT (integrity of ciphertexts) 
and IND-CCA ((adaptive) chosen ciphertext attack). 



Integrity of visual cue An adversary may try to modify some L-shape blocks such 
that the visual cue on two barcode blocks are swapped, and thus, he can rearrange the two 
blocks without being detected. As discussed in Section 4.2, any modification of an L-shape 
block's brightness will have j chance of not being detected. Suppose at least /3 number of 
L-shape blocks have to be modified in order to deceive the user, then the chances of not 
being detected will be {\)^ , where /3 depends on the size of a barcode block, and the visual 
cue design. 

However, the above analysis does not hold when we consider the whole process of decod- 
ing, where the error correction is included. Recall that, due to inevitable noise, we need to 
apply error correcting before extracting £j,j,(ms). Therefore, when small number of L-shape 
blocks are corrupted, the payload mi can still be correctly decoded. Hence, the choice of 
error-correction and the design of the cues cannot be done separately. Furthermore, some 
error-correction code can correct more errors than its guaranteed level in some situations. 
Due to the concern of forgery, it is important not to correct those errors. 

To prevent an adversary from making small changes that can deceive the user and yet get 
verified, one design consideration of the visual cue is to choose symbols with large mutual 
Hamming distance from each other. In our implementation to be described in Section 7, we 
use numerical digits as visual cue, where the minimum hamming distance for two symbols is 
14 "L-blocks" (for example, the number "1" and "7", "0" and "8"). We choose parameters 
of error correcting code that is able to tolerate 4 bits noise for every 63 bits. Note that 
modifying a "L-blocks" may result in two bits flipped, thus, the probability that an attacker 
can modify the visual cue of a barcode to another is less than $(3; 14, 0.75) — 3.98% where 
$ is the cumulative distribution function of the binomial distribution B(14,0.75). 

Modifying control points The adversary may try to modify the control points and 
this may cause failure in decoding, giving a string of "random" bits which is unlikely to pass 
the MAC authentication check. Hence, modifications of control points at most amount to a 
denial of service attack, which is not our main concern. 

5 Visual Cues for Verification of Multiple Barcodes 

In this section, we discuss a few designs of visual cue, in particular, for barcodes appeared 
in a linear sequence, and barcodes rendered as table. Recall that the main purpose of the 
visual cue is to bind location information to the barcodes, so that User can visually verified 
that the barcodes are in the correct arrangement. 

Linear Sequential Barcodes. Consider a sequence of barcodes appearing in the order 
Bi, B2, ■ ■ ■ ,Bn. The order of appearance gives implicit structure of the encoded message. 
For instance, the message could be a string divided into substrings where each substring 
is encoded in a single barcode. Hence, it is important to protect the order of appearance, 
even if the user may not interested in viewing all of them. A natural visual cue would be a 
counter, starting from 1, that is, the visual cue of block Bi is i. To indicate the end of the 
sequence, the last block contains a special symbol, say "." in our example, to indicate end 
of sequence. 

Barcodes in Table Structure. Consider a table of messages where each message is 
encoded in a barcode. The barcodes are depicted in the natural table arrangement: for any 2 
messages in the same row, the corresponding barcodes are also in the same row, and likewise 
for columns. To protect such correspondence, we propose the following rules of assigning the 
visual cue: 




(a) Mobile captures every blocks, then (b) Mobile displays the loca- 
verifies and renders the whole message. tion(row/column) information en- 
coded in the barcodes. 

Fig. 4. Illustration of alternative methods (for simplicity, only the barcodes and mobile device are 
shown here). 

Rl The numerical value of the visual cue symbol on the top row, leftmost block is 1. The 
value increments by 1 from left to right. At the end of the row, the increment process 
continues at the leftmost block of the row below if any. 

R2 The rightmost block in each row has the additional cue which is a black dot indicating 
this is the end of row. 

R3 The rightmost block in the bottom row has an additional large black rectangle indicating 
this is the last block. 



Figure 1(b) shows an example of such barcode table. To verify that a table of barcodes 
are in the correct arrangement. User simply needs to verify the continuity of the counter, 
every but the last row ends with a small dot, and the last barcode ends with a big dot. It 
is easy to verify that by imposing the above rules, any insertion, deletion or rearrangement 
of the barcodes can be detected by visual inspection. 

6 Alternative Methods 

Besides using visual cues, there are other techniques to ensure that the barcodes are in 
correct order. This section compares our scheme with a few alternatives. In general, our 
scheme uses more pixels to carry the visual cue symbols. On the other hand, it has the 
following advantage: (1) It does not disrupt the user by requiring the user to scan all the 
barcodes. (2) It does not require the user to count the blocks on the terminal's display unit 
to verify the current block sequence on the mobile. (3) It allows the placement of barcodes 
to spread across different positions in a scrolling page, or even in different pages. A brief 
illustration of the alternative methods is given in Figure 4. 

Embedding a HMAC of all blocks. In this method, given a long message ms, Server 
computes a HMAC for the whole ms and embeds mg and its tag into a few barcodes. During 
authentication, the user first scans across all the barcodes, then Mobile responds whether 
the HMAC agree with the content in the barcodes (Figure 4(a)). If so. Mobile renders the 
long message and user navigates to obtain the required information. The advantages of this 
method are (1) the user does not need to verify the visual cue, and (2) the barcode is more 
efficient in the sense that it does not need to embed the visual cue. 



However, there are a few disadvantages of this method. Firstly, the scanning process 
could be less preferred when the user only want to browse a subset of the message (e.g. a 
user who wants to check a particular record from a list of transactions) . Secondly, it is not 
easy to navigate using the relatively smaller display panel in the mobile device. Furthermore, 
it is not clear how to extend this method to the models where Mobile device is not trusted. 

Encoding location hints in barcode. When the message can be represented as a form 
of table, one may try to secure the authenticity by using the row and column attributes 
as location information: Given a table mg. Server first divides it into sub-tables, then it 
encodes each sub-table together with the corresponding row and column attributes into 
barcodes. When Mobile decodes the barcode, it shows the corresponding attributes of the 
sub-table as shown in Figure 4(b). 

The advantage of this method is that it does not require the user to scan barcodes or ver- 
ify visual cues, and the user can readily browse a sub-table of interest. While rearrangement 
attack can be prevented as the row and column information are encoded in the barcode, 
this method subjects to deletion attacks: the adversary may remove or duplicate an entire 
row of barcode without being detected. Although this could be patched by encoding more 
information (e.g. the total number of barcodes), the verification cost will increase (the user 
needs to count the barcode blocks). 

7 Implementation 

The useability of our proposed method can be improved using "augmented reality" as de- 
scribed in the introduction. We implemented a proof-of-concept system using webcam and 
laptop to simulate the mobile device. 

Deploying Machines and Softwares. We simulate the mobile device and its cam- 
era using a Thinkpad X200 notebook (Intel core 2 duo 2.26GHz CPU and 2GB memory) 
equipped with an inexpensive usb webcam. To simulate the computing power of a typical 
mobile device, we allocate only 10% CPU time and 128M memory for our program. We 
use a Dell desktop machine with Intel Core 2 Duo 2.33GHz CPU with 4GB of memory 
and Windows XP SP3 to simulate the network terminal. The resolution of the webcam is 
640 X 480 pixels with a maximum frame rate of 30 frame per second. We tested the system 
on three different display units: (1) a 19 inch flat TFT monitor in Dell model Optiplex 
755; (2) a 15 inch flat TFT Dell UhraSharp monitor; and (3) a 15 inch Dell CRT monitor. 
All configuration of the display units such as brightness resolution are reset to the default 
setting. In the following sections, we call these three display units monitor 1, monitor 2 and 
monitor 3 respectively. 

We use OpenCV libraries [ ] for basic image processing operation and interfaces to the 
camera. 

Choice of Parameters. We use AES with 128 bit key for encryption scheme, HMAC 
based on SHAl for message authentication code, and calculator fonts of numeric digits as 
visual cues symbols. We use a (63, 36, 11)-BCH error correcting code [14,4] to correct errors. 
That is, for every 36 bits, we add 27 bit of redundancy and we are able to correct 5 error 
bits. However, to prevent modification of visual cue, we reject to decode if there are more 
than 3 error bits. 

Image Processing Issues. We use oversampling technique to reduce the noise of a 
captured image: one bit in the barcode is rendered using 2x2 pixels. Let us call a group of 




(a) Histogram of the displacement. (b) The error rate of the three monitors 

over different frames. 



Fig. 5. The performance of implementation. 



2x2 pixels a "superpixel" . Such oversampling can reduce the noise due to mis-alignment 
and mitigating other artifacts, but it also reduce the channel capacity by a factor of 4. 

We use landmark-mapping [i] method for image registration. That is, after Server gen- 
erates the barcodes, it super-imposes a set of 2D points P called control points, whose 
position is known by Mobile, on the barcode image. 

After Mobile captured a screenshot, it extracts the control points and find the best 
geometric transformation that maps the extracted control points to their original locations. 
In our implementation, we find the best linear transformation that matches the points. The 
transformation is then applied to the barcode image. 

8 Performance 

In this section we measure the performance of our proof-of-concept implementation in terms 
of error rate, frame rate and channel capacity. 

Image Registration Error. To measure the accuracy of our image registration, we first 
generate an image of many blue points with the red control points. The image is displayed 
on the three display units and captured by the camera. Image registration is then carried 
out and the displacement of blue points are measured. Here we use the Euclidean distance 
to measure the amount of displacement. 

Our camera is able to capture a region of around 20 blue points. Figure 5(a) shows the 
histogram of the displacement of all the blue points on monitor 1. Note that the average 
displacement is less than 1 pixel. The image registration algorithm can be further refined 
by incorporating more effective and efficient known techniques. 

Noise Level in Capturing Superpixels. We now measure the probability of error in 
reading a superpixel. The camera is able to capture a block that consists of around 200 x 150 
superpixels at a time. 

After registration, we count the mismatches of superpixel between the registered image 
and the original image. 30 measurements are taken for each of the three display units. 
Figure 5(b) shows the error for each measurement. 



Frame Rate. The frame rate of our implementation is over 15 frames per second running 
on the laptop machine. Although the implementation is not tested on mobile device, we 
believe a typical mobile device that has similar processing power could achieve more than 
10 frames per second, which is acceptable for most applications. 

Capacity of Visual Channel. We now give calculation for the size of payload (size of 
TOs, the message Server sends to User) that can be embedded in a block that occupies 10000 
pixels of Terminal's display unit. Recall that we used 2x2 pixels to encode 1 bit of the 
barcode, employed a (63,36,11) BCH error correcting code, and used L-block to preserve 
the related location. Thus the payload is 10000 ^ 1 ff ^ 1 ~ ^''^ ^^^^ ^'-'^ such a block. 

9 Related Work 

There is an extensive amount of literatures exploiting the camera as an additional visual 
channel for communication. Jacobs et al. [15] gave a method that establishes a channel from a 
controllable light source to a camera. McCune et al. proposed seeing-is-believing [J (S], which 
carries out authentication and key-exchange over a visual channel established between a 
device's display and another device's camera. Wong et al. [25] built a prototype on a Nokia 
Series 60 handphone that provides 46 bits for authentication over the visual channel. 

Data can be transmitted to a camera effectively using 2D barcodes. There are many 2D 
barcode designs, for example, QR code [2] and the High Capacity Color Barcode (HCCB) [20] 
that uses colored triangles. Many barcodes are designed to encode data in printed copies. 
There are also proposals that use other types of sources in the visual channel. CoUomosse 
et al. proposed "Screen codes" [8] for transferring data from a display to a camera-equipped 
mobile device, where the data are encoded as a grid of luminosity fluctuation within an 
arbitrary image. A challenging hurdle in using hand-held cameras to establish the channel 
is motion blur. A few stabilization algorithms are developed for handheld camera [22,1!)], 
and for 2D barcodes [(>]. 

Similar to our scheme, Costanza et al. [!)] suggested a technique to embed designs into 
barcodes to increase the expressiveness and to bring visually meaning to them. These systems 
recognize the barcodes based on the topology, rather than geometry, of the codes [Id], 
and were initially developed for tracking objects in tangible user interfaces and augmented 
reality applications [j l]. Augmented reality has been exploited to enhance user experience on 
many applications including education [17], gaming [23], outdoor activities [24]. Rekimoto 
et al. [21] Using 2D barcodes as the visual tags in the augmented reality environment, where 
a camera can capture the barcode on physical object and link them to their information. 

10 Conclusion 

In this paper, we investigated how visual channel can be deployed to enhance security of the 
communication between server and user in various settings. We pointed out that although 
authentication of an individual barcode can be easily carried out, the interesting technical 
challenge is in the verification of the relationships among several barcodes. This leads us to 
look into the problem of "subregion authentication" where a user wants to verify selective 
small pieces of data within a large dataset. Although there are a few methods to overcome the 
problem, they introduce disruptions during the interactive session and are thus less user- 
friendly. To achieve seamless interactions, we proposed using visual cue to bind location 
information to the barcode, so as to aid the user in visually verifying the data. 

Our protocols demonstrated that, the visual channel "enhanced" with the visual cue, 
together with the mobile device's input/output device, jointly provide more flexibility in 
designing secure protocols. Viewing from another perspective, our investigation highlights 



limitations of visual channel, for instance, the observation that confidentiality is difficult to 
achieve under the setting where either the mobile device or the terminal could be dishon- 
est. Our solution serves as an interesting example where security is achieved by coupling 
computer's processing power with human perceptual system. The design of our barcode also 
serves as an interesting application of fragile watermark. 

To demonstrate the concept, we give a system that simulates the mobile device using 
webcam and laptop. The performance of the system is promising. Although we have not yet 
implemented the framework on actual mobile device, we believe that the processing power 
of many current mobile devices is sufficient to provide seamless interactions. 
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